-
Notifications
You must be signed in to change notification settings - Fork 425
feat: Add support for rolling back to timestamp #2879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks like Test Infra failed need to retrigger tests. |
jayceslesar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a couple of suggestions and the tests also look good!
| def latest_ancestor_before_timestamp(table_metadata: TableMetadata, timestamp_ms: int) -> Snapshot | None: | ||
| """Find the latest ancestor snapshot whose timestamp is before the provided timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice from a users perspective to allow this to be a datetime as well? Does differ from the java impl though...
| for ancestor in ancestors_of(table_metadata.current_snapshot(), table_metadata): | ||
| if timestamp_ms > ancestor.timestamp_ms > result_timestamp: | ||
| result = ancestor | ||
| result_timestamp = ancestor.timestamp_ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about using a max(filter(...)) statement here? I think a little easier to follow than the double greater than expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not, this does match the java
|
|
||
| return self.set_current_snapshot(snapshot_id=snapshot_id) | ||
|
|
||
| def rollback_to_timestamp(self, timestamp_ms: int) -> ManageSnapshots: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could add support for datetime too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be useful helper, but i think we use timestamp_ms in most places right now
Co-authored-by: Chinmay Bhat <12948588+chinmay-bhat@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds functionality to rollback an Iceberg table to a specific point in time by providing a timestamp. The implementation finds the latest ancestor snapshot whose timestamp is strictly before the given timestamp and rolls back to it.
Changes:
- Added
latest_ancestor_before_timestamphelper function to find the appropriate snapshot given a timestamp - Added
rollback_to_timestampmethod to theManageSnapshotsAPI for timestamp-based rollback - Added comprehensive unit and integration tests for the new functionality
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pyiceberg/table/snapshots.py | Implements latest_ancestor_before_timestamp function to find the latest snapshot before a given timestamp |
| pyiceberg/table/update/snapshot.py | Adds rollback_to_timestamp method to ManageSnapshots class with proper error handling |
| tests/table/test_snapshots.py | Adds unit tests for latest_ancestor_before_timestamp covering various edge cases |
| tests/integration/test_snapshot_operations.py | Adds integration tests for rollback_to_timestamp including error cases and method chaining |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
| yield from ancestors_of(to_snapshot, table_metadata) | ||
|
|
||
|
|
||
| def latest_ancestor_before_timestamp(table_metadata: TableMetadata, timestamp_ms: int) -> Snapshot | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we could refactor ancestors_of with an optional lambda. :P
|
|
||
| return self.set_current_snapshot(snapshot_id=snapshot_id) | ||
|
|
||
| def rollback_to_timestamp(self, timestamp_ms: int) -> ManageSnapshots: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be useful helper, but i think we use timestamp_ms in most places right now
|
Thanks @geruh for the PR and thanks for the review @nssalian, @jayceslesar, @.copilot |
Rationale for this change
This PR adds the ability to rollback a table to a ancestoral snapshot given a timestamp. Some of this work was also done in #758, and is a progress pr to be merged after #2871 & #2878. This is standalone from the other changes but it makes use of the helpers in the other prs.
Additionally, adding some more tests.
Are these changes tested?
Yes
Are there any user-facing changes?
New API for meta